Streamlining Inter-Operation Communication in Fine-Grain Parallel Processors
نویسندگان
چکیده
Traditionally, register files have been the primary agent for inter-operation communication in load/store architectures. As processors start issuing multiple instructions per cycle, a centralized register file can easily become a bottleneck. This paper analyzes the register file traffic in a load/store architecture with a view to motivate the development of alternate inter-operation communication mechanisms that reduce the bandwidth demanded of a centralized register file. We first provide metrics to characterize the register traffic. These metrics deal with the degree and locality of use of the register instances created. We then present the results of a simulation study that uses the MIPS R2000 architecture and the SPEC benchmark programs. We have two major results. First, a large number of the register instances are used only once, and the average degree of use of register instances is about 2. Second, most of the register instances are used up soon after they are created (within about 30-40 instructions). This suggests that alternate inter-operation communication mechanisms that exploit the temporal locality of use of register instances are likely to be effective in reducing the traffic burden on the centralized register file. The second result was pivotal in the design of the distributed register file for the multiscalar processing paradigm.
منابع مشابه
Near fine grain parallel processing using a multiprocessor with MAPLE
Multi-grain parallelizing scheme is one of effective parallelizing schemes which exploits various level parallelism: coarse-grain(macro-dataflow), medium-grain(loop level parallelizing) and near-fine-grain(statements parallelizing) from a sequential program. A multi-processor ASCA is designed for efficient execution of multi-grain parallelizing program. A processing element called MAPLE are mai...
متن کاملSupporting tasks with adaptive groups in data parallel programming
A set of communication operations is defined which allows a form of task parallelism to be achieved in a data parallel architecture. The set of processors can be subdivided recursively into groups, and a communication operation inside a group never conflicts with communications taking place in other groups. The groups may be subdivided and recombined at any time, allowing the task structure to ...
متن کاملConstrained Fine-Grain Parallel Sparse Matrix Distribution
We consider how to distribute sparse matrices among processors to reduce communication cost in parallel sparse matrix computations, in particular, sparse matrix-vector multiplication. We allow 2d distributions, where the distribution (partitioning) is not constrained to just rows or columns. The fine-grain model is a 2d distribution introduced in [2] where nonzeros can be assigned to processors...
متن کاملFine grain parallelism on a MIMD machine using FPGAs
Current MIMD machines are used for coarse grain-parallelism and also ooer messsage passing mechanisms to deal with inter-processor communications. But these mechanisms lack eeciency in ne-grain parallel applications such as systolic computation. This article presents the use of an FPGA chip to set up a fast systolic communication agent on a linear asynchronous network of Transputer processors; ...
متن کاملExperience with Fine-Grain Communication in EM-X Multiprocessor for Parallel Sparse Matrix Computation
Sparse matrix problems require a communication paradigm different from those used in conventional distributed-memory multiprocessors. We present in this paper how fine-grain communication can help obtain high performance in the experimental distributed-memory multiprocessor, EM-X, developed at ETL, which can handle fine-grain communication very efficiently. The sparse matrix kernel, Conjugate G...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1992